Problems with Claude Code Agentic Workflow
INTRO
I’m sure everyone’s had Claude Code pop up in their feed over the last couple weeks. Maybe you’ve seen one of mine that did a short proof of concept agentic workflow, going through multiple transcripts to generate a report.
It’s easy to see why this is very exciting. It comes very close to the promised land - an assistant that works in the background, fulfilling whatever request you have for it.
I’ve spent the last 3-4 days trying to make this workflow …work.
I regret to inform you that based on some extensive testing, we are not quite there yet. Maybe in a few months, but not today.
CORE PROBLEMS
Here are some of the core problems.
Context Limit
This was by far the most common issue that I ran into. It hits you at three levels.
- MCPs have a tight token limit.
- Claude Code’s PDF read function also has a tight page limit (100 pages)
- There’s also the global daily / weekly rate limit which is going to get put on soon. This is actually less of an issue because you can just switch to API credits, but I’m sure there’ll be issues when switching between two modes.
Many times, this leads to failed function calls, failed subagent routines - and failed orchestration when the error detection / retry bugs out.
Bugs with GEmini CLI
You can try to get around the context limit by using Gemini - the GOAT when it comes to long context stuff.
But unfortunately, Gemini CLI itself has bunch of bugs.
- File write permission is not persistent.
- It has trouble locating files and automatically defaults to web_search even when told not to use it and to look for files in local directory.
- Many times, it forgets that its read tool can process PDFs - sometimes being reminded of this helps, but other times it still refuses. This results in failed calls.
Bugs(ish) with Claude Code
Even when not dealing with context window limit - and working only with Claude Code + subagents, you run into many issues while doing long agentic workflows.
The primary issue has to do with subagent hanging / failure. Sometimes, after 10-15 calls, the process can hang entirely.
This is fixable by just stopping and telling it to continue - but for a process that’s supposed to be in the background, this kind of babysitting is a deal breaker.
I am also fairly certain there is some sort of a memory leak going on with subagents or mcp calls, but I haven’t dug into it enough.
There’s also performance issues - for example, PDF reading is done only one at a time. This is different than how any of the web based tools work, and is very inefficient. It makes for very slow calls when you have bunch of small files to read.
Debugging / Iteration
The bigger problem is that debugging and iteration is extremely difficult.
Each attempt can take 30-45min to complete, if done correctly.
Because many of the bugs happen after it’s been running for a while, it doesn’t get detected for a long time - leading to very long feedback cycle.
Going through logs is very rough. Many times, it doesn’t even load properly in terminal window because it’s so long.
Often logs are not detailed enough to know exactly what went wrong.
Given this, it’s extremely frustrating to debug and improve workflows.
General Problem with agentic workflows
The biggest issue I ran into - however - wasn’t even technical. It was a broader, almost philosophical issue at play.
By using agentic workflow, I was attempting to mix in non-deterministic problem solving to a scaled, automated workflow. After all, this is the entire reason for using Claude Code - it can make basic decisions and solve unforeseen problems - as long as they weren’t too difficult - in an automated fashion - without me scripting in every last edge case into the code.
The problem is - often, there are multiple ways to solve problems. And while Claude’s solution may be reasonable one, it was one that did not fit the exact context.
One small example of this is dealing with download request failures. Sometimes, Claude would just call the download function again - which would work. Other times, it would just try to search for it online. And sometimes, it just gave up entirely.
Another example would be that Claude sometimes decided that some documents - based on the title - wasn’t important enough. So it would only include one of the 4 downloaded documents to read.
These are all things that a reasonable analyst would do - but they would be caught in real time, they’ll learn about the working context - and hopefully, if they are a good analyst - learn from it. I found myself having to defensively prompt, including every last detail about what to do exactly.
At that point - why am I even using agentic workflow? I might as well code that up.
FINAL THOUGHTS
One of the issue with writing a review like this is that at the pace that these tools are moving, many problems may be fixed - very soon. A few weeks from now - every technical problem that I mentioned above could be gone.
But that last one - one where you basically have to write a detailed instruction on exactly what to do when you run into a specific problem - that’s a tough one. Especially when combined with slow iteration and nightmarish debugging of the process, I wonder if there’s any future in this workflow pattern after all.
For now, I’m going back to scripting everything by hand, and making creation of custom workflows as seamless as possible.
Onwards.